Switching Dynamic System Models for Speech Articulation and Acoustics
نویسندگان
چکیده
A statistical generative model for the speech process is described that embeds a substantially richer structure than the HMM currently in predominant use for automatic speech recognition. This switching dynamic-system model generalizes and integrates the HMM and the piece-wise stationary nonlinear dynamic system (statespace) model. Depending on the level and the nature of the switching in the model design, various key properties of the speech dynamics can be naturally represented in the model. Such properties include the temporal structure of the speech acoustics, its causal articulatory movements, and the control of such movements by the multidimensional targets correlated with the phonological (symbolic) units of speech in terms of overlapping articulatory features. One main challenge of using this multi-level switching dynamic-system model for successful speech recognition is the computationally intractable inference (decoding) on the posterior probabilities of the hidden states. This leads to computationally intractable optimal parameter learning (training). Several versions of Bayesian networks have been devised with detailed dependency implementation speci ed to represent the switching dynamic-system model of speech. We discuss the variational technique developed for general Bayesian networks as a suboptimal solution to the decoding and learning problems. Some common operations of estimating phonological states' switching times have been shared between the variational technique and the human auditory function that uses neural transient responses to detect temporal landmarks associated with phonological features. This suggests that the variation-style learning may actually take place in human speech perception under an encoding-decoding theory of speech communication which highlights the critical roles of modeling articulatory dynamics for speech recognition and which forms a main motivation for the switching dynamic system model described in this chapter.
منابع مشابه
Improving on Hidden Markov Models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding
The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve o...
متن کاملComparison of Motor Skills Among Studens with Intellectual Disability, Stuttering, Articulation Problems and Normal Speech
Objective: This research aimed to compare the motor skills among students with intellectual disability, stuttering, articulation problems and normal speech. Methods: The study was a retrospective causal-comparative research. From among all elementary male students with intellectual disability in Urmia city, 90 students (30 students in each group) were selected. All groups completed the revised ...
متن کاملSpeech and Reading Disorders Screening, and Problems in Structure and Function of Articulation Organs in Children in Mashhad City, Iran
Background and Objectives: Investigating the prevalence of speech and language disorders and the contributing factors can help determine the best treatment options suited to the needs of these patients. So far, no comprehensive study has been conducted on screening speech and reading disorders and problems in the structure and function of articulation organs (PSFAOs) in children in Mashhad City...
متن کاملModeling of Speech Production from the Perspective of Neuroscience
Recent neural models are capable of generating quantitative patterns of speech articulation and speech acoustics. Five models are discussed here: the DIVA model, the task dynamics model, the ACT model, the Warlaumont model and the Hickok model. These models have a more or less strong background in neuroscience. Directions are identified in this paper for a further development of quantitative pr...
متن کاملVocal tract representation in the recognition of cerebral palsied speech.
PURPOSE In this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine. METHOD Data were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech. In the 1st experiment, the authors provided a bas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001